Antipattern: Using Indexes Without a Plan

Let's see how using indexes inappropriately affects a query.

We'll cover the following

No indexes
Too many indexes
When no index can help

When we choose our indexes by guessing, we inevitably make some wrong choices. Misunderstandings about when to use indexes lead to mistakes in one of these three categories:

Defining no indexes or not enough indexes
Defining too many indexes, or indexes that don’t help
Running queries that no index can help

No indexes#

We commonly read that a database incurs overhead costs as it keeps an index up-to-date. Each time we use INSERT, UPDATE, or DELETE, the database has to update the index data structures for that table to be consistent so that our subsequent searches are able to use these indexes to find the right set of rows reliably.

We’re trained to think that overheads mean waste. Thus, when we read that databases incur overheads to keep their indexes updated, we instinctively want to eliminate those overheads. Some developers conclude that the remedy is to eliminate the indexes themselves. This advice is common, but it ignores the fact that indexes have benefits that justify the overhead cost they cause.

Not all overhead is waste. Don’t our companies employ administrative staff, legal professionals, and accountants? Don’t they pay for facilities even though these expenses don’t directly contribute to generating revenue? Of course, they do, and it is because these expenses contribute to the success of companies in significant ways.

In a typical application, we’ll run hundreds of queries against a table to update it for everyone. Every time we run a query that uses an index, we win back the overhead that went into maintaining that index.

An index can also help an UPDATE or DELETE statement by finding the desired rows quickly. For example, the index on the bug_id primary key helps the following statement:

Enter to Rename, Shift+Enter to Preview

Updating Bugs table

Run the query in the following playground to see how the data is updated.

Enter to Rename, Shift+Enter to Preview

Retrieving data after updating Bugs table

A statement that searches an unindexed column performs a full table scan to find a match.

Enter to Rename, Shift+Enter to Preview

An unindexed column performing a full table scan to find a match

Too many indexes#

We benefit from an index only if we run queries that use that index. There’s no benefit to creating indexes that we won’t use. Here are some examples:

CREATE TABLE Bugs (
bug_id SERIAL PRIMARY KEY,
date_reported DATE NOT NULL,
summary VARCHAR(80) NOT NULL,
status VARCHAR(10) NOT NULL,
hours NUMERIC(9,2),
INDEX (bug_id),                         --1
INDEX (summary),                        --2
INDEX (hours),                          --3
INDEX (bug_id, date_reported, status)   --4
);

Enter to Rename, Shift+Enter to Preview

Creating Bugs table

Indexes aren’t standard

In the previous example, there are several useless indexes:

bug_id: Most databases create an index automatically for a primary key, so it’s redundant to define another index for this. This means that such an index would only cause extra overhead. Each database brand has its own rules for when to create an index automatically, so we need to read the documentation for the database that we use.
summary: An index for a long string data type like VARCHAR(80) is larger than an index for a more compact data type. We’re also not likely to run queries that search or sort by the entiresummary column.
hours: This is another example of a column that we’re probably not going to need to search specific values from.
bug_id, date_reported, status: There are good reasons to use compound indexes, but many people create compound indexes that are redundant or seldom used. Besides avoiding this, it is also important to remember that the correct ordering of columns in a compound index is essential; we should use the columns in a left-to-right order in search criteria, join criteria, or sorting order.

Some people create indexes on every column and combination of columns because they don’t know which indexes will benefit their queries. However, this overlooks the fact that when they cover a database table with indexes, they incur a lot of overhead with no assurance of payoff.

When no index can help#

The next type of mistake is to run a query that can’t use any index. Developers create more and more indexes, trying to find some magical combination of columns or index options to make their query run faster, but it is all in vain.

We can think of a database index using the analogy of a telephone book. If I ask you were asked to look up everyone in a telephone book whose last name was Charles, it would be an easy task, since all the people with the same last name are listed together in a telephone book.

However, if we were asked to look up everyone in the telephone book whose first name was Charles, then the order of the names in the book wouldn’t benefit us. Anyone could have that first name, regardless of their last name, so we would have to search through the entire book line by line.

The telephone book is ordered by the last name and then by the first name, just like a compound database index on last_name, first_name. This index doesn’t help us search by the first name.

Enter to Rename, Shift+Enter to Preview

Creating an index TelephoneBook

Some examples of queries that can’t benefit from this index include the following:

SELECT * FROM Accounts ORDER BY first_name, last_name;

This query shows the telephone book scenario. If we create a compound index for the two columns, last_name followed by first_name (as in a telephone book), the index doesn’t help us sort primarily by first_name.

SELECT * FROM Bugs WHERE MONTH(date_reported) = 4;

We could create an index for the date_reported column, but even then the order of the index doesn’t help us search by month. The order of this index is based on the entire date, starting with the year. But each year contains the fourth month, so the rows where the month is equal to 4 are scattered throughout the index.

Some databases support indexes on expressions or indexes on generated columns, as well as indexes on plain columns. But we have to define an index prior to using it, and the index only helps for the expression that we specify in its definition.

SELECT * FROM Bugs WHERE last_name = 'Charles' OR first_name = 'Charles';

Thus, we’re back to the problem in the example above: the rows that contain our specified first name are scattered unpredictably according to the order of the index that we have defined. The result of the previous query is the same as the result of the following:

SELECT * FROM Bugs WHERE last_name = 'Charles' 
UNION 
SELECT * FROM Bugs WHERE first_name = 'Charles';

The index in our example only helps us find our specified last name; it doesn’t help us find our specified first name.

Low-selectivity indexes

SELECT * FROM Bugs WHERE description LIKE ‘%crash%’;

Because the pattern in this search predicate could occur anywhere in the string, the sorted index data structure is unable to help us.

Synopsis: Index Shotgun

Solution: MENTOR Your Indexes

Mark as Completed

Report an Issue

Introduction

Logical Antipattern - Jaywalking

Logical Antipattern - Naive Trees

Logical Antipattern - ID Required

Logical Antipattern - Keyless Entry

Assessment: Logical Antipatterns - Part 1

Logical Antipattern - Entity-Attribute-Value

Logical Antipattern - Polymorphic Associations

Logical Antipattern - Multicolumn Attributes

Logical Antipattern - Metadata Tribbles

Assessment: Logical Antipatterns - Part 2

Physical Antipattern - Rounding Errors

Physical Antipattern - 31 Flavors

Physical Antipattern - Phantom Files

Physical Antipattern - Index Shotgun

Assessment: Physical Antipatterns

Query Antipattern - Fear of the Unknown

Query Antipattern - Ambiguous Groups

Query Antipattern - Random Selection

Query Antipattern - Poor Man’s Search Engine

Query Antipattern - Spaghetti Query

Query Antipattern - Implicit Columns

Assessment: Query Antipattern

Application Development Antipattern - Readable Passwords

Application Development Antipattern - SQL Injection

Application Development Antipattern - Pseudokey Neat-Freak

Application Development Antipattern - See No Evil

Application Development Antipattern - Diplomatic Immunity

Application Development Antipattern - Magic Beans

Assessment: Application Development Antipattern

Conclusion

Antipattern: Using Indexes Without a Plan

No indexes#

Too many indexes#

When no index can help#